Skip to content

Conversation

@LaszloLango
Copy link
Contributor

@LaszloLango LaszloLango commented Sep 3, 2018

Removed 'jerry_parse_and_save_literals' and introduced
'jerry_get_literals_from_snapshot' instead which works
on snapshot buffers rather than source code. Added literal
saving feature to snapshot merge in the snapshot command
line tool. Also added missing 'jerry_cleanup()' calls to the
snapshot tool. Improved the console messages of the snapshot
tool.

Based on previous work of Tamas Zakor ztamas@inf.u-szeged.hu

JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com

@LaszloLango LaszloLango added enhancement An improvement api Related to the public API snapshot Related to the snapshot feature labels Sep 3, 2018
@LaszloLango LaszloLango added this to the Release 2.0 milestone Sep 3, 2018
@LaszloLango LaszloLango mentioned this pull request Sep 3, 2018

Collect the used literals from the given source code and save them into a specific file in a list or C format.
These literals are generated by the parser, they are valid identifiers and none of them are magic string.
Collect the used literals from the given snapshot and save them into a buffer in a list or C format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... in list or C format.

Collect the used literals from the given source code and save them into a specific file in a list or C format.
These literals are generated by the parser, they are valid identifiers and none of them are magic string.
Collect the used literals from the given snapshot and save them into a buffer in a list or C format.
None of these literals are magic string. In C format only valid identifiers are collected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... are magic strings.

sizeof (save_literal_buffer) / sizeof (uint32_t),
true);
jerry_value_t generate_result;
generate_result = jerry_generate_snapshot (NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to split this into two statements? I've checked, jerry_value_t generate_result = jerry_generate_snapshot (... fits in the line length limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed indeed, but I think it looks a bit better. I can change if you want. Both solutions are fine to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variables should be initialized at the time of their declaration whenever possible. This may be called a personal preference. It's no blocker but wanted to make a note.

bool is_c_format) /**< format-flag */
jerry_get_literals_from_snapshot (const uint32_t *snapshot_p, /**< input snapshot buffer */
size_t snapshot_size, /**< size of the input snapshot buffer */
uint32_t *lit_buf_p, /**< [out] buffer to save literals to */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is lit_buf_p an uint32_t * if it stores a string? It will be cast to uint8_t * in the code anyway. (Question nr.2: why not to char * or jerry_char_t *?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It comes from the original code. I think it is because the snapshot command line tool, because we use an uint32_t output buffer for both snapshots and literals. char * would not be the best choice, because the output is not a zero terminated string and it can be misleading. IMHO jerry_char_t * would be the best choice. I'll update the PR.

sizeof (output_buffer) / sizeof (uint32_t),
is_save_literals_mode_in_c_format);
if (literal_buffer_size == 0)
uint32_t *lit_out_buf_p = output_buffer + snapshot_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_buffer is an uint32_t array, snapshot_size seems to be a expressed in bytes. Pointer arithmetics will cause the addition to be off by a factor of 4 than expected.

is_save_literals_mode_in_c_format);
if (literal_buffer_size == 0)
uint32_t *lit_out_buf_p = output_buffer + snapshot_size;
size_t lit_out_buf_sz = (sizeof (output_buffer) / sizeof (uint32_t)) - snapshot_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expression also seems to be incorrect. sizeof (output_buffer) is already in bytes, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra note / question: What ensures that the output of jerry_get_literals_from_snapshot will fit in the buffer of size lit_out_buf_sz? And how could we detect if it didn't?

.help = "export literals found in parsed JS input (in list format)"),
CLI_OPT_DEF (.id = OPT_MERGE_LITERAL_C, .longopt = "save-literals-c-format",
.meta = "FILE",
.help = "export literals found in parsed JS input (in C source format)"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to have a separate subcommand for extracting literals from a snapshot instead of adding export options to both generate and merge subcommands? That would reduce code duplications also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it this way, but we can change it. Do you mean something like this:
jerry-snapshot literal --format=list -o literal.list FILE_1 ... FILE_N ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking something like this. It would have several benefits, IMO.

The subcommands of the snapshot tool would work as analogies to well known tools from the native domain, e.g., 'generate' is like a compiler, 'merge' resembles a linker, and this 'literal' (or however it is named) could be the counterpart of objdump. (Ofc, none of them are perfect matches, there are important differences everywhere.)

There is a reason why these build tools are separated. This way a user doesn't have to execute unnecessary functionalities to perform a given step. E.g., in this case, assumig that a snapshot is already present but the user forgot to ask for a literal dump at the time of its creation, they don't have to recreate the snapshot from source just to get that dump but it can be retrieved from the snapshot itself.

All this is also aligned with the linux command line tool philosophy: a tool (now, subcommand) should do one thing but do that right. (I.e., don't clone the same functionality into different tools but put it into a single one and leave the functionality composition via sequential execution or pipes or whatever to the user.)

@LaszloLango LaszloLango force-pushed the snapshot-merge-save-literals-dev branch from 75ebbed to f1c1596 Compare September 4, 2018 11:07
@LaszloLango LaszloLango changed the title Improve API of literal save Improve API of literal save and the snapshot (command line) tool Sep 4, 2018
@LaszloLango LaszloLango force-pushed the snapshot-merge-save-literals-dev branch from f1c1596 to 9809103 Compare September 4, 2018 11:09
@LaszloLango
Copy link
Contributor Author

@akosthekiss thanks for your review. I've updated the PR.

@LaszloLango LaszloLango force-pushed the snapshot-merge-save-literals-dev branch 2 times, most recently from 7e85b71 to ecc0e2b Compare September 4, 2018 13:35
bool is_c_format);
jerry_get_literals_from_snapshot (const uint32_t *snapshot_p,
size_t snapshot_size,
uint8_t *lit_buf_p,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .c file declares this as jerry_char_t.

ecma_collection_header_t *lit_pool_p = ecma_new_values_collection ();
ecma_save_literals_add_compiled_code (bytecode_data_p, lit_pool_p);
JERRY_ASSERT ((header_p->lit_table_offset % sizeof (uint32_t)) == 0);
const uint8_t *literal_base_p = (uint8_t *) (snapshot_data_p + header_p->lit_table_offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot_data_p is already a uint8_t *. Is this explicit cast to the same type needed?

ecma_free_value (JERRY_CONTEXT (error_value));
return 0;
ecma_raise_type_error ("Invalid snapshot format");
return ecma_create_error_reference_from_context ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this function here. But how can its return value be compatible with size_t? And even if it is a number, how can it have any error signalling power? If it is non-zero, it will be interpreted as a buffer size. If it is zero, well, it cannot encode anything useful then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it looks like it is messed up after the rebase. I'll fix it.

ecma_bytecode_deref (bytecode_data_p);
ecma_collection_header_t *lit_pool_p = ecma_new_values_collection ();
scan_snapshot_functions (snapshot_data_p + header_p->func_offsets[0],
snapshot_data_p + header_p->lit_table_offset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to literal_base_p?

{
is_c_format = true;
}
else if (strcmp ("list", fromat_str_p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using else if (!strcmp ("list", format_str_p)) { is_c_format = false; } else { ... return FAIL; }. Although list is the default format, an intentionally weird command line might cause counterintuitive behaviour with the current approach: jerry-snapshot --format c --format list will not switch back to list format. (A cli option may appear multiple times and the last one counts, usually.)

};

/**
* Process 'literal-dump' command.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

literal-dump as a subcommand name is a bit long to my taste but I let others comment on this. (dump, litdump, literals?)

if (literal_file_p == NULL)
if (number_of_files < 1)
{
jerry_port_log (JERRY_LOG_LEVEL_ERROR, "Error: at least one input files must be passed.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...one input file... ...specified.

{
/* The input contains more than one input snapshot file, so we must merge them first. */
const char *error_p = NULL;
size_t merged_snapshot_size = jerry_merge_snapshots (snapshot_buffers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Cannot it be specified that literal dump works on one snapshot only? Doing merge in this subcommand seems to go against the 'do one thing but do that right' idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, yes it is necessary. This is how we usually going to use it. E.g.: in IoT.js. It doesn't save the merged snapshot and does not modify the input, so I think it does only one thing (gather the literals). I don't see any problem with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the auto-merge. Snapshot tool is not intended to run in low-end, so adding convenience functions should not hurt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: this code could become simpler if jerry_merge_snapshots could deal with corner cases like number_of_files=1. I recall that there was some discussion about this back when the snapshot tool was first implemented. one more item on a todo list?


if (literals_file_name_p == NULL)
{
literals_file_name_p = is_c_format ? "literals.h" : "literals.list";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cli option comment documents the c format case to write its output to literals.c not .h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the option comment.

@LaszloLango LaszloLango force-pushed the snapshot-merge-save-literals-dev branch from ecc0e2b to 2547ed9 Compare September 5, 2018 06:41
@LaszloLango
Copy link
Contributor Author

@akosthekiss thanks for your review. I've updated the PR.

Collect the used literals from the given source code and save them into a specific file in a list or C format.
These literals are generated by the parser, they are valid identifiers and none of them are magic string.
Collect the used literals from the given snapshot and save them into a buffer in list or C format.
None of these literals are magic strings. In C format only valid identifiers are collected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the C limitation? It would not be bad to control this with a flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Can be done later if backward incompatible API changes can be avoided)

Copy link
Contributor Author

@LaszloLango LaszloLango Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that special unicode characters break the C format. It is deficiency of the current version (latest master) as well. I'd like to work on it in a followup if it is good for you.

*/
static void
scan_snapshot_functions (const uint8_t *buffer_p, /**< snapshot buffer start */
const uint8_t *buffer_end_p, /**< snapshot buffer end */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't really like removing this, but since this is a static function we can reintroduce it later if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me hm, too. I didn't want to suggest the removal of this argument, although I see how my comment has led here. I should dive a bit deeper into snapshot functions to check the semantics behind buffer_end_p and literal_base_p, where and when and how they are used, why they are the same at all/most call sites, how can the literal data start after the snapshot buffer, etc. If they are really always equal then that may hint at a conceptual problem (or that may be too harsh, something that could be optimised, or named better, etc).

So I +1 now to no removal and to switch back to the original redundant arguments. (But someone could put this on a todo list and take a closer look.)

if (snapshot_size <= sizeof (jerry_snapshot_header_t))
{
ecma_free_value (JERRY_CONTEXT (error_value));
/* Invalid snapshot format */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a few sanity checks, for example the jerry magic string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now, I've added more checks.

{
/* The input contains more than one input snapshot file, so we must merge them first. */
const char *error_p = NULL;
size_t merged_snapshot_size = jerry_merge_snapshots (snapshot_buffers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the auto-merge. Snapshot tool is not intended to run in low-end, so adding convenience functions should not hurt.

@LaszloLango LaszloLango force-pushed the snapshot-merge-save-literals-dev branch 2 times, most recently from 57d93e5 to 1cafb83 Compare September 5, 2018 12:21
{
FILE *literal_file_p = fopen ("literals.txt", "w");
fwrite (save_literal_buffer, sizeof (uint8_t), literal_sizes, literal_file_p);
FILE *literal_file_p = fopen ("literals.txt", "wb");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: jerry_get_literals_from_snapshot is called with is_c_format=true, so the .txt extension is a bit counterintuitive in the example. (this is nothing functional, just an observation)

*/
static void
scan_snapshot_functions (const uint8_t *buffer_p, /**< snapshot buffer start */
const uint8_t *buffer_end_p, /**< snapshot buffer end */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me hm, too. I didn't want to suggest the removal of this argument, although I see how my comment has led here. I should dive a bit deeper into snapshot functions to check the semantics behind buffer_end_p and literal_base_p, where and when and how they are used, why they are the same at all/most call sites, how can the literal data start after the snapshot buffer, etc. If they are really always equal then that may hint at a conceptual problem (or that may be too harsh, something that could be optimised, or named better, etc).

So I +1 now to no removal and to switch back to the original redundant arguments. (But someone could put this on a todo list and take a closer look.)

{
/* The input contains more than one input snapshot file, so we must merge them first. */
const char *error_p = NULL;
size_t merged_snapshot_size = jerry_merge_snapshots (snapshot_buffers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: this code could become simpler if jerry_merge_snapshots could deal with corner cases like number_of_files=1. I recall that there was some discussion about this back when the snapshot tool was first implemented. one more item on a todo list?

jerry_port_log (JERRY_LOG_LEVEL_ERROR, "Error: cannot open file: '%s'\n", literals_file_name_p);
jerry_cleanup ();
return JERRY_STANDALONE_EXIT_CODE_FAIL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if (...) { log and return } ... (without an else branch) style is used all over the code, and could also be used here: if (file_p == NULL) { bail out }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Removed 'jerry_parse_and_save_literals' and introduced
'jerry_get_literals_from_snapshot' instead which works
on snapshot buffers rather than source code. Added literal
saving feature to snapshot merge in the snapshot command
line tool. Also added missing 'jerry_cleanup()' calls to the
snapshot tool. Improved the console messages of the snapshot
tool.

Based on previous work of Tamas Zakor <ztamas@inf.u-szeged.hu>

JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com
@LaszloLango LaszloLango force-pushed the snapshot-merge-save-literals-dev branch from 1cafb83 to 8d25a6f Compare September 6, 2018 08:26
@LaszloLango
Copy link
Contributor Author

@zherczeg @akosthekiss the PR is updated.

Copy link
Member

@akosthekiss akosthekiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akosthekiss akosthekiss merged commit 99c9a22 into jerryscript-project:master Sep 6, 2018
@LaszloLango LaszloLango deleted the snapshot-merge-save-literals-dev branch September 20, 2018 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the public API enhancement An improvement snapshot Related to the snapshot feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants